Barq: distributed multilingual internet search engine with focus on Arabic language
نویسندگان
چکیده
♣ This work was supported financially by Alakhawayn University in Ifrane, Morocco under R&D Grant RPF1/2001 and by CoreSoft SARL. * 0-7803-7952-7/03/$17.00 2003 IEEE. Abstract Barq is a distributed multilingual search engine with focus on the Arabic language. The Barq R&D project has involved, over a period of some two years, work on Arabic language processing, Arabic word root extraction, indexing, information retrieval, automatic categorization, focused crawling, distributed computing, distributed database systems, and performance tuning. Barq indexes all documents of the web (and optionally of a particular site) including Word and XML documents that contain at least a single word of Arabic in CP1256, UTF-8, ISO8859_6, ASMO 449 or ASMO 708 code set. The documents themselves can contain other Latin-based characters. This paper focuses on describing the architecture and design patterns of Barq; as well as the various types of search that Barq supports. Issues such as Stemming/Arabic root extraction, indexing, ranking, precision and recall measurements, automatic categorization etc., are presented too, but their details are described in other works.
منابع مشابه
A Model for Multilingual Search Engine
How to find needed information from the web is a critical issue in the Internet. Fortunately, search engines are useful tool to retrieve information from Internet. Although, Internet users speak different languages most of resources are written and published in the English. All Internet search engines provide a lingual search, although some of them enable the searcher to select the language of ...
متن کاملMultl-Language Text Indexing for Internet Retrieval
We address here the issues associated with indexing multilingual collections of information, as is found for example on the internet. We examine in particular the task of language identiication and the use of stemming algorithms for several European languages. We also present the lessons we have learned from our experience in using the SPIDER information retrieval system as a search engine over...
متن کاملImage Retrieval Using a Multilingual Ontology
Search engines are among the most useful Internet applications. There exist several media types on the Web and, given the particularities of each of them, adapted search solutions are required. We limit our discussion to image search engines. While rapid and robust, existing image search engines offer results that respond only partially to the user’s queries. An improvement of image search resu...
متن کاملTowards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملModern Multilingual and Cross-lingual Information Access Technologies
In this chapter, we describe the state of the art cross-lingual and multilingual strategies and their related areas. In particular, we show a WWW-based information system called MIETTA, which allows uniform and multilingual access to heterogeneous data sources in the tourism domain. The design of the search engine is based on a new cross-lingual framework. The framework integrates a cross-lingu...
متن کامل